Machine learning algorithm to predict the in-hospital mortality in critically ill patients with chronic kidney disease

Abstract Background This study aimed to establish and validate a machine learning (ML) model for predicting in-hospital mortality in critically ill patients with chronic kidney disease (CKD). Methods This study collected data on CKD patients from 2008 to 2019 using the Medical Information Mart for Intensive Care IV. Six ML approaches were used to build the model. Accuracy and area under the curve (AUC) were used to choose the best model. In addition, the best model was interpreted using SHapley Additive exPlanations (SHAP) values. Results There were 8527 CKD patients eligible for participation; the median age was 75.1 (interquartile range: 65.0–83.5) years, and 61.7% (5259/8527) were male. We developed six ML models with clinical variables as input factors. Among the six models developed, the eXtreme Gradient Boosting (XGBoost) model had the highest AUC, at 0.860. According to the SHAP values, the sequential organ failure assessment score, urine output, respiratory rate, and simplified acute physiology score II were the four most influential variables in the XGBoost model. Conclusions In conclusion, we successfully developed and validated ML models for predicting mortality in critically ill patients with CKD. Among all ML models, the XGBoost model is the most effective ML model that can help clinicians accurately manage and implement early interventions, which may reduce mortality in critically ill CKD patients with a high risk of death.

Background chronic kidney disease (cKD) has become a severe global health concern, with roughly 700 million individuals currently suffering from cKD [1]. cKD is one of the world's top 10 leading causes of mortality, impacting around 15% of the adult population [2]. as the intensive care unit (icU) population shifts, many patients have preexisting cKD [3,4]. in recent years, despite more medical resources devoted to treating cKD, the life expectancy of cKD patients remains much lower than that of the general population [5]. according to one study, the risk of death is three times greater for patients with cKD in the icU than those without cKD [6]. early identification of cKD patients at high risk for clinical deterioration is of great importance and may help to deliver proper care and optimize the use of limited resources [7]. thus, the clinical practice may benefit from developing predictive models that can accurately predict an individual's survival prognosis.
Machine learning (Ml) algorithms may offer an opportunity to reduce the risk of death from cKD in critically ill patients through their ability to analyze the vast amount of data in electronic health records. these data may include patient diagnoses, demographics, routinely obtained measures, and therapies. these cutting-edge data-driven methods can handle data with a high dimension, analyze complex relationships, and isolate essential predictors of outcomes. they are more flexible than traditional modeling techniques, which require predictors to be independent of each other and use variables selected primarily based on the statistical significance or clinical importance [8,9]. in recent years, Ml methods have been widely used in the prognostic assessment of diseases [10][11][12]. clinicians may better screen for and identify patients at high risk of adverse outcomes with a well-built prediction model, allowing for more prompt intervention and better outcomes. Unfortunately, no Ml model can predict in-hospital mortality among critically ill cKD patients. this research aimed to establish and validate an Ml model for predicting in-hospital mortality in critically ill patients with cKD.

Database introduction
the Medical information Mart for intensive care iV (MiMic iV) database is a comprehensive, anonymized clinical dataset approved by the Massachusetts institute of technology [13]. the MiMic iV database contains data on all Beth israel Deaconess Medical center icU patients between 2008 and 2019. as all patients in the database are anonymous and have no impact on clinical decision-making, the requirement for patient consent and ethically informed consent declarations was waived [14]. One author (Xl) passed the Protecting human Research Participants exam of the National institutes of health (record iD: 35970146) and gained permissible access to the MiMic iV database.

Study population
this research comprised all patients diagnosed with cKD who were enrolled in MiMic iV. the diagnosis of cKD was based on the international classification of Diseases, Ninth Revision (icD-9) codes (5851, 5852, 5853, 5854, 5855, 5856, 5859), and international classification of Diseases, tenth Revision (icD-10) codes (N18, N181, N182, N183, N184, N185, N186, and N189), which were recorded by hospital staff at the time of patient discharge. Patients admitted to the icU more than once had their first admission counted. We eliminated patients under 18 and those who spent less than 24 h in the icU.

Data collection
this study identified candidate variables for the model based on clinical expertise and previous studies [1]. We used Navicat Premium to extract the demographic and clinical data from the MiMic iV database. the research gathered age, gender, weight, ethnicity, and admission type as demographic factors. Medical conditions included congestive heart failure, peptic ulcer disease, myocardial infarction, peripheral vascular disease, diabetes, dementia, chronic pulmonary disease, rheumatic disease, cerebrovascular disease, cancer, paraplegia, liver disease, and acquired immune deficiency syndrome. Vital signs data, including heart rate, mean arterial pressure, respiratory rate, temperature, and oxygen saturation are averaged over the first 24 h after admission to the icU. laboratory results included hematocrit, hemoglobin, platelets, white blood cell, blood urea nitrogen, anion gap, international normalized ratio, serum creatinine, serum glucose, serum calcium, serum chloride, bicarbonate, serum potassium, serum sodium, partial thromboplastin time, and prothrombin time, all of which were maximum values within 24 h of admission to the icU. the urine volume is recorded as the total value of the first 24 h after admission to the icU. in addition, we recorded medical treatments such as renal replacement therapy, vasopressor use, and mechanical ventilation during the first 24 h following icU admission. During the first 24 h following icU admission, we determined the sequential organ failure assessment (sOFa) score and the first value of the simplified acute physiology score ii (saPs ii) to use as the severity scores of illness. We also collected the patients' cKD stage and estimated glomerular filtration rate (eGFR). comorbidities for this study were defined according to the icD-9 codes and icD-10 codes [15].
Endpoints the endpoint of this study was in-hospital mortality.

Preprocessing of data
there were less than 20% missing values for any variable in this study (supplementary table s1). the multiple interpolation methods are better for dealing with missing data below 20%. the multiple interpolation methods allow for the creation of multiple reasonable, fully interpolated datasets, which are first analyzed individually, and then their results are combined into a single result. We created 20 fully interpolated datasets using Python's 'micforest' package and then pooled with Rubin's rules. the estimated parameters were then pooled with Rubin's rules [16].

Statistical analysis
continuous variables in this study were expressed as the median and interquartile range (iQR), and the Mann-Whitney test was used to determine differences between groups due to their non-normal distribution. categorical variables were expressed as numbers and percentages, and group comparisons were made using the chi-square test or Fisher's exact test, as appropriate.
statistical analyses were performed using R software (version 4.2.1) (R Foundation for statistical computing, Vienna, austria) and Python (version 3.9.12). a p value <.05 was considered to be statistically significant.

Machine learning
this study performed a hierarchical fivefold cross-validation to obtain the training set and validation sets. the study population was randomly divided into five subsets. Four of these subsets (80%) were combined as the training set, while the remaining (20%) were made the validation set, and this process was repeated five times for each outcome.
this study uses six Ml techniques, including logistic regression, support vector machine (sVM), k-nearest neighbor (KNN), decision tree, random forest (RF), and eXtreme Gradient Boosting (XGBoost), to develop and validate models for the risk of death in critically ill patients with cKD. logistic regression is a classification model. We have chosen the dichotomy logistic regression vs. Ml because the logistic regression does not require the optimization of any hyperparameter and is thus easier to implement. sVM is a binary linear classifier. sVM separates different classes by establishing a decision boundary between two classes and optimizing the hyperplane distance between the boundary points, which can be obtained with reasonable accuracy from small data sets to achieve labeled prediction of one or more feature vectors. KNN is one of the most basic and simple Ml algorithms. it can be used for both classification and regression. KNN performs classification by measuring the distances between different feature values. the decision tree is a single base classifier consisting of nodes and edges. starting from the root node, also known as the first split point, the split determines the divisions of the entire dataset based on calculation. the process continues from top to bottom until no more partitioning is required, and the leaves present at the end of the decision tree represent the last partitions. RF is an ensemble learning method to overcome the drawbacks of a single base prediction model, aiming to achieve higher accuracy. this model includes multiple decision trees corresponding to various sub-datasets created from an identical dataset. XGBoost establishes K regression trees to make the predicted value of the tree group close to the real value as much as possible and can generalize as much as possible. the objective function of XGBoost requires the prediction error to be as small as possible, the number of leaf nodes to be as small as possible, and the number of nodes to be as low as possible.
We let each Ml algorithm's default hyper-parameters take effect to get started with a model. afterward, we fine-tuned the parameters by searching the grid by hand. tenfold cross-validation was used to find the optimal settings of the hyperparameters. the predictive performance of Ml models was evaluated using accuracy, area under the curve (aUc), sensitivity, and specificity. For aUc, the 95% ci was computed with 2000 stratified bootstrap replicates. the testing aUc values corresponding to the different models were compared using paired Delong's test. accuracy and aUc were used to choose the best model. the probability of the best-performing model is evaluated using Brier scores and plotting the calibration curve for each model. to compare the predictive power between models, we calculated the Brier score for each model and plotted the calibration curve. Net reclassification improvement (NRi) was used to assess the correct reassignment between risk categories. in addition, the best model was interpreted using shapley additive exPlanations (shaP) values and local interpretable Model-agnostic explanations (liMe) algorithm. Finally, a sensitivity analysis of the results was performed.

Class imbalance
the in-hospital mortality rate of cKD patients in this study was 16.5%. as the performance of the Ml model may be affected by class imbalance, we performed a complementary analysis using an up-sampling approach.

Results
Participants a total of 16,751 individuals were found to have cKD and be eligible to participate; however, 6314 were disqualified for non-first icU admissions, and 3167 were disqualified due to having an icU stay of fewer than 24 h. in the end, 8527 patients were eligible for the study (Figure 1). among icU-admitted cKD patients, the in-hospital death rate was 16.5% (1406/8527). the median age of these patients was 75.1 (iQR: 65.0-83.5) years, and 61.7% (5259/8527) were male. congestive heart failure (4543/8527, 53.3%), diabetes mellitus (4222/8527, 49.5%), and sepsis (3516/8527, 41.2%) were the top three comorbidities. table 1 provides a summary of the basic characteristics of the data set.

Model development and validation
We developed six Ml models using clinical variables as input factors, including logistic regression, sVM, KNN, decision tree, RF, and XGBoost. compared to other Ml models, the XGBoost model performed the best with an aUc of 0.860 (logistic regression: 0.841; sVM: 0.751; KNN: 0.641; decision tree: 0.601; RF: 0.834) (Figure 2(a)). the aUc in the XGBoost model was higher than in the other five models (p < .001) (supplementary table s2). similarly, the XGBoost model outperformed various clinical disease severity scores (sOFa score (aUc): 0.762; saPs ii (aUc): 0.768) (Figure 2(B)). table 2 displays the performance of our further analysis of the performance of these six Ml models in terms of their precision, sensitivity, specificity, brier score, and NRi. calibration plots for the six Ml models are shown in supplementary Figure s1.

Model explainability
By using shaP values, we aimed to elucidate the mortality prediction process of the XGBoost model. Figure 3 depicts the feature importance ranking of the XGBoost model with shaP summary plots, where sOFa score, saPs ii, respiratory rate, and urine output are the four factors that contribute most to the model. in addition, we used shaP dependence analysis to illustrate the effect of a single input variable on the final results of the XGBoost prediction model (Figure 4). Figure 5 shows the results of a more in-depth analysis of the four most influential clinical characteristics of the XGBoost prediction model output. in addition, we used the liMe algorithm to explain the individualized prediction of death by taking two samples (one survival and one deceased) from the validation set (supplementary Figure s2).

Sensitivity analyses
For patients with non-first icU admission (N = 6314), the XGBoost model remained robust in predicting mortality in these patients (aUc: 0.821). Detailed results are shown in supplementary Figure s3.

Class imbalance
the performance results of the up-sampling approach show very similar results (supplementary table s3).

Discussion
in this investigation, we constructed and tested six Ml models for predicting in-hospital mortality in critically ill patients with cKD. the XGBoost model outperformed other models (including logistic regression, sVM, KNN, decision tree, and RF models) and traditional risk scores (including sOFa score and saPs ii) in predicting the death of critically ill patients with cKD. according to the feature importance evaluation, the four most important features of the XGBoost model that had the greatest predictive potential for mortality were the sOFa score, saPs ii, respiratory rate, and urine output. Moreover, we explain how these characteristics impact the XGBoost model. these results may contribute significantly to understanding Ml models for predicting death in critically ill patients with cKD.
in recent years, cKD has profoundly impacted the prognosis and treatment options for several morbidities [17][18][19]. Furthermore, as the prevalence of cKD continues to rise in the general population and among icU patients, preexisting cKD may drastically alter the treatment methods for these patients when admitted to the icU [20][21][22]. therefore, to identify those at high risk of clinical deterioration and facilitate early preventive measures that may reduce mortality, it is necessary to develop and promote prediction models that can early and swiftly predict death in critically ill patients with cKD.  CKD: chronic kidney disease; aiDS: acquired immune deficiency syndrome; MaP: mean arterial pressure; SpO 2 : oxygen saturation; WBC: white blood cell; Bun: blood urea nitrogen; inR: international normalized ratio; PT: prothrombin time; PTT: partial thromboplastin time; eGFR: estimated glomerular filtration rate; RRT: renal replacement therapy; SOFa: sequential organ failure assessment; SaPS ii: simplified acute physiology score ii. in this analysis, the XGBoost model outperformed the other Ml models in predicting mortality in cKD patients in critical care. the results of this study agree with those of numerous others. liu et al. revealed that the XGBoost model outperformed other Ml models, including logistic regression, RF, and sVM, in predicting death in acute kidney injury patients [23]. hu et al. discovered that XGBoost performed better than sVM, KNN, logistic regression, decision tree, Naive Bayes, and RF [11]. a meta-analysis indicated that XGBoost outperformed other Ml methods (such as sVM and Bayesian networks) for predicting acute kidney injury [24]. in addition, traditional severity scoring systems, such as the sOFa and  saPs ii scores, performed poorly compared to Ml models, indicating that they may not be reliable tools for predicting death in critically ill patients with cKD. in addition, our study showed that sOFa or saPs ii scores alone performed poorly in predicting mortality in critically ill patients with cKD compared with the Ml model. although the sOFa and saPs ii scoring systems may estimate the likelihood of bad outcomes in critically ill patients, excluding a significant number of relevant factors from their analyses may result in less accurate prediction than multivariable models [25]. Previous research has demonstrated that when compared to Ml models, the sOFa score and saPs ii perform poorly in predictive performance [8].
in this investigation, we used the Ml method for the first time to predict in-hospital mortality in critically ill patients with cKD. By ranking the importance of variables in the XGBoost model, we found that sOFa score, saPs ii, respiratory rate, and urine output were the variables that contributed most to predicting mortality in critically ill patients with cKD. the sOFa score is a tool that describes the presence of organ dysfunction [26]. it assigns each of the six organ systems (respiratory, circulatory, renal, hematologic, hepatic, and central nervous system) a daily score between 1 and 4 based on the severity of organ failure, with higher values indicating more severe organ dysfunction [27]. some studies have shown that high sOFa scores are associated with higher mortality [28]. similarly, the present research found that the sOFa score was the most significant predictor of death in critically ill patients with cKD, and it was given the highest weight in the XGBoost model. saPs ii is another essential factor that influences mortality. the saPs ii score comprises seventeen factors, with higher scores indicating illness severity [29]. Previous research has shown that saPs ii is related to a greater death rate among icU patients [30]. in addition, we discovered that respiratory rate is a significant predictor of death in critically ill patients with cKD. several studies have found an association between respiratory rate and worse outcomes [31]. Our research also showed a correlation between urine output and death among cKD patients in critical care. Oliguria is common in icU patients and is the ultimate cause of renal parenchymal damage [32]. some studies have shown that decreased urine output is associated with poor outcomes in critically ill patients [33].
however, this study also has some shortcomings. First, this was retrospective modeling research conducted at a single center using the MiMic iV database, and we could not identify the causal association between characteristics and outcomes. in order to verify the accuracy of our approach, we need further prospective randomized clinical trials. second, our research's retrospective and observational design may inevitably result in selection bias. third, we estimated specific missing data using padding, which may have led to discrepancies from the actual numbers. Finally, in this work, the model was only tested internally; external validation at multiple centers is needed to confirm the usefulness of the model.

Conclusions
in conclusion, we successfully developed and validated Ml models for predicting mortality in critically ill patients with cKD. among all Ml models, the XGBoost model is the most effective Ml model that can help clinicians accurately manage and implement early interventions, which may reduce mortality in critically ill cKD patients with a high risk of death.

Ethical approval
MiMic iV was set up with the approval of the institutional Review Board at the Massachusetts institute of technology. all participant data were anonymized to safeguard their privacy. Due to the use of anonymized health records, ethical approval was not required. this study adheres to the ethical criteria outlined in the helsinki Declaration of 1964.

Consent form
Due to the use of anonymized health records, informed consent was not required.

Disclosure statement
No potential conflict of interest was reported by the author(s).  Data availability statement the datasets presented in the current study are available in the MiMic-iV database (https://physionet.org/content/ mimiciv/1.0/).